-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update precision in the ONNX strided_slice, update precision of ToScalar #6272
Update precision in the ONNX strided_slice, update precision of ToScalar #6272
Conversation
@@ -374,7 +374,7 @@ inline bool IsEqualScalar(const Expr& a, const Expr& b) { | |||
* \param i element index | |||
* \return Converted scalar value. | |||
*/ | |||
static inline double ToScalar(const runtime::NDArray& array, size_t i = 0) { | |||
static inline long double ToScalar(const runtime::NDArray& array, size_t i = 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why double is not sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because a double only has 52 bits of mantissa, it can't store the full precision of an int64_t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we get rounding errors if we pass in large int64_t values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On x86, long double has 63 bits of mantissa and 1 bit of sign, just like int64. On PowerPC and ARM, it's a 128bit floating point with 106 bits of mantissa.
Thanks @mbrookhart |
…lar (apache#6272) * Update precision in the ONNX strided_slice, update precision of ToScalar * fix tests
…lar (apache#6272) * Update precision in the ONNX strided_slice, update precision of ToScalar * fix tests
…lar (apache#6272) * Update precision in the ONNX strided_slice, update precision of ToScalar * fix tests
…lar (apache#6272) * Update precision in the ONNX strided_slice, update precision of ToScalar * fix tests
…lar (apache#6272) * Update precision in the ONNX strided_slice, update precision of ToScalar * fix tests
Fixes #6263
ToScalar was casting a 64 bit Int to a 64 bit float, which reduced precision too much. I switched things to use int64_t/long double where needed to keep precision.
Thanks!
cc: @kevinthesun @zhiics @yongwww @lixiaoquan